Small Cycles in Small Worlds 
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We characterize the distributions of short cycles in a large metabolic network previously shown 
to have small world characteristics and a power law degree distribution. Compared with three 
classes of random networks, including Erdos-Renyi random graphs and synthetic small world 
networks of the same connectivity, the metabolic network has a particularly large number 
of triangles and a deficit in large cycles. Short cycles reduce the length of detours when a 
connection is clipped, so we propose that long cycles in metabolism may have been selected 
against in order to shorten transition times and reduce the likelihood of oscillations in response 
to external perturbations. 



Systems as diverse as the Western US power grid, 
metabolic networks of a cell, or the World Wide Web 
are well described as graphs with characteristic topology. 
Small world networks have received considerable atten- 
tion since the seminal paper by Watts and Strogatz ||] . 

Most of the existing literature discusses small world 
networks in terms of the average path length between 
two vertices B or of the network's clustering coefhcient 
[^, ^ which measures how close the neighborhood of a 
each vertex comes on average to being a complete sub- 
graph (clique) [|l| . Barabasi et al. g |^ focussed on the 
degree distributions, finding a power law in a suite of 
real world examples including the world wide web or the 
US power-grid. Recent work on the spread of epidemics 
on a small world network M] emphasizes the importance 
of "far-reaching" edges. The idea is that clipping a far 
edge will force a (relatively) long detour in the network. 
Hence it is these edges that are responsible for the small 
diameter of the graph G. 

Let us look at detours in graphs in more systematic 
way. Throughout this paper we will represent a network 
as a simple (unweighted, undirected) graph G{V, E) with 
vertex set V and edge set i?. A cycle in G is a closed path 
which meets each of its vertices and edges exactly once. 
The length of a cycle C, i.e., the number of its vertices 
or edges, is denoted by \C\. With each edge e & E we 
can associate the set 5(e) containing the shortest cycles 
in G that go through e. It is easily verified that a far 
edge in the sense of Q] is an edge that is not contained 
in a triangle. In other words, e is a far edge if and only 
if S{e) does not contain a triangle. The cycles C G 5(e) 
determine the shortest detours (which have length |C| — 1) 
when e is removed from the graph. 

It seems natural to consider the set S{G) — IJeeB '^(^) 
of shortest cycles of all edges in G and to study e.g. their 
length distribution. However, as the example in Fig.|l| 
shows, the shortest cycles S{G) do not convey the com- 
plete information about the graph. Additional cycles ap- 
pear to be relevant, such as the hexagon in Figure ||. 



FIG. 1: S{G) consists of the twelve triangle only. The 
hexagon (bold edges), however, is obviously crucial for 
the network structure. 

In order to extend S{G) to a more complete collection 
of cycles we need some more information on the cycle 
structure of graphs. Recall that the set of all subsets 
of E forms an |_E| -dimensional vector space over {0, 1} 
(with addition and multiplication modulo 2). Vector ad- 
dition in this edge space is given by symmetric differ- 
ence X ® Y = {X U Y) \ {X n Y). The cycle space C 
consisting of all cycles and edge-disjoint unions of cy- 
cles in G is a particularly important subspace of the 
edge space [pi. Its dimension is the cyclomatic number 
u{G) = \E\ - \V\ + c{G), where c{G) is the number of 
connected components of G. The length t{B) of a basis 
B of the cycle space {cycle basis for short) is the sum of 
the lengths of its cycles: i{B) = J2ceB \^\- ^ minimum 
cycle basis (MCB) is a cycle basis with minimum length. 
MCBs have the property that their longest cycle is at 
most as long as the longest cycle of any basis of C [g| . A 
MCB therefore contains the salient information about the 
cycle structure of a graph in its most compressed form. 
Most graphs, however, do not have a unique MCB. On 
the other hand, the distribution of cycle lengths is the 
same in all MCBs of a given graph [|lO|. The way to 
avoid ambiguities is to consider the union of all minimum 
cycles bases, also known as the set 7?,(G) of relevant cy- 



cles. The term "relevant" is justified by two important 
properties of TZ{G): (i) a cycle is relevant if and only if 
it cannot be written as an ®-sum of shorter cycles [Q, 
and (ii) the shortest cycles through an edge are relevant, 
i.e., S{G) C TZ{G) ||l^, 0. Consequently, the composi- 
tion of TZ{G) in terms of number and length distribution 
of cycles is an important characteristic of a graph. The 
numerical studies below make use of Vismara's |ll[| algo- 
rithm for computing TZ{G), which is based on Horton's 
MCB algorithm [|l|. 

The most common model of graph evolution, intro- 
duced by Erdos and Renyi p4| , assumes a fixed number 
n = \V\ of vertices and assigns edges independently with 
a certain probability p |15| . In many cases ER random 
graphs turn out the be quite different from a network of 
interest. The Watts-Strogatz |l| model of small world 
networks starts with a deterministic graph, usually a cir- 
cular arrangement of vertices in which each vertex is con- 
nected to k nearest neighbors on each side. Then edges 
are "rewired" (in the original version) or added M, M 
with probability p. We shall consider the latter model 
for A; = 1, denoted SWl below, which corresponds to 
adding random edges to a Hamiltonian cycle. Both ER 
and SWl graphs exhibit an approximately Gaussian de- 
gree distribution. 

In many real networks, however, the degree distribu- 
tion follows a power law. Barabasi et al. & o| show 
that the scale invariant behavior of the degree distribu- 
tions can be explained in terms of simple graph evolution 
model (AB model): Starting from a small core graph, at 
each time step a vertex is added together with m edges 
that are connected to each previously present vertex k 
with probability n(fc) — d{k)/J2i ^ij), where d{j) is the 
degree of vertex j. In this contribution we will focus 
mostly on the AB model instead of Watt's original con- 
struction, because we will apply the analysis of the cycle 
structure to an empirical network for which a power-law 
like degree distribution has been established. This net- 
work is the system of all chemical reactions required for 
the synthesis of small-molecule building blocks and en- 
ergy in the bacterium Escherichia coli. Its structure de- 
scribed in ref. p7| . Such chemical reaction networks are 
often referred to as metabolic networks. 

It is clear that all triangles in a graph are relevant, 
since a triangle is for sure a shortest cycle through each 
edge. Hence |7?,(G')| > A, where A denotes the number 
of triangles in G. We expect (A)er = (3)^'^ triangles 
in an ER random graph with edge-drawing probability p. 
For the SWl graphs we obtain a similar expression: 

(A)swi ^np + nin - 4)p^ + -nin^ - 9n + 20)p^ . (1) 

6 

The MCB will therefore consist almost exclusively of 
triangles if A ::^ ^{G)- The average vertex degree is 
d = 2\E\/n = p{n - 1) for ER and d = 2 + p{n - 3) for 
SWl, resp. Assuming that n is large we expect to find 
only triangles in 'R-{G) for d 3> v3n. Numerical simu- 
lations show that this is indeed the case. Fig 0. In this 
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FIG. 2: Relevant non-triangles in ER (D), SWl (A), and 
AB (•) random graphs with n = 100. 

regime, we have |7?.(G')| ~ d^/6, and the graph contains 
no far edges. Not surprisingly, there is little difference 
between SWl and ER random graphs for large n. 

Since the AB model is based on a fixed vertex degree 
d, it should be compared to random graph models with 
given vertex degree d, not with given edge drawing prob- 
abilities p. We have an asymptotically constant number 
of triangles for both ER and SWl: Aer -^ d'^/6 and 
Aswi -^ d^/6 — d + 2/3, resp. Note that as a conse- 
quence the clustering coefficient vanishes asymptotically. 
In SW networks with a priori connectivity fc > 1 we find 
of course a number of triangles that grows at least lin- 
early with n, since the initial (j> = 0) networks already 
contains (fc — l)n triangles. The clustering coefficient 
stays finite for large n |1§|] . 

The large vertex degree of the "early" vertices in the 
AB model suggests that there should be many more trian- 
gles than in ER or SWl models. The expected degree of 
vertex s at "time" t is known iQ: d{s\t) = m[y^t/s— 1]. 
The probability of an edge between s and t, t > s, is 
therefore pst = md{s\t — l)/2(i — l)m, where 2(i — l)m 
is the sum of the vertex degrees at "time" t — 1. Thus 
(A) ~ J2r<s<tPrsPstPrt- This Can be approximated by 
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(l/st^) 

l<r<s<t 




^Gm^\n^n + Oi\n^n) (2) 

Fig. a shows A for typical AB-random graphs with m = 
2, . . . , 8 as a function of "time'. The behavior of A in a 
individual growing network is well represented by equ.fl). 
The number |7?.| — A of non-trivial relevant cycles has 
its maximum around |-E| ~ 0.7An^'^ independent of the 
model. The scaling of their number is consistent with 
|7^| - A ~ Cn^/'^, where the constant G w 0.036 is the 
same for ER and SWl random graphs and C w 0.016 for 
the AB models. For small vertex degrees, d <^ \V\^^'^ we 
find TZ{G) « i^{G), i.e., the MCB is (almost) unique. 
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FIG. 3: Triangles in AB models with different values of 
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FIG. 4: Mean length of a relevant cycle in AB networks. 



The cyclomatic number of a AB random graph is 
v(G) ~ (to/2 — l)n; Hence eq.(H) implies that almost 
all relevant cycles must be long. Fig. | shows that the 
average length of a relevant cycle grows logarithmically 
with n. Not surprisingly, the slopes decrease with to. 

Let us now turn to an example of metabolic networks. 
Because it is germane to their functional analysis, we 
first point out a nexus between graph representations of 
metabolic network, and metabolic flux analysis (MFA), 
the most generic framework to analyze the biological 
function of metabolic networks. A graph representation 
of metabolic networks was introduced as a substrate graph 
S in ||l^ . Its vertices are the molecular compounds (sub- 
strates); two substrates k and I are adjacent in S if they 
participate in the same reaction r. Substrate graphs are 
undirected because directed graphs would not properly 
represent the propagation of perturbations: even for ir- 
reversible reactions the product concentration may af- 
fect the the reaction rate, for instance by product occu- 
pancy of the enzyme's active site; this in turn affects the 



TABLE I: Cycle Structure of Metabolic Networks. 




Model 


\c\ 


3 


4 


5 


6 


7 


8 
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E 


Ecolil 


MCB 


282 


51 


19 


20 


3 


5 


1 


381 




7^ 


379 


114 


90 


83 


5 


36 


16 


723 




5 


379 


56 


24 


42 


2 


14 


16 


533 


AB 


MCB 


78 


158 


124 


20 


0.4 


0.01 





380 




7^ 


81 


285 


527 


161 


5.5 


0.4 





1060 




5 


81 


273 


414 


144 


5.5 


0.4 





918 


ER 


MCB 


18 


58 


163 


131 


11 


0.4 





381 




7^ 


18 


61 


212 


528 


82 


3.2 





904 




5 


18 


61 


205 


311 


68 


3.2 





666 


SWl 


MCB 


15 


46 


131 


167 


21 


1.1 


0.03 


381 




n 


15 


48 


157 


427 


151 


7.1 


0.2 


805 




s 


15 


48 


155 


301 


108 


6.5 


0.2 


634 



substrate concentration. Thus, perturbations may travel 
backwards even from irreversible reactions. A similar ar- 
gument for considering undirected graphs can be derived 
from metabolic control theory |Q • 

The key ingredient of MFA is the stoichiometric matrix 
S. Its entries are the stoichiometric coefficients Skr, i-e., 
the number of molecules of species k produced {skr > 0) 
or consumed {skr < 0) in each reaction r. Reversible 
reactions are entered as two separate reactions in most 
references. In general, additional "pseudo-reactions" are 
added to describe the interface of the metabolic reaction 
network with its environment. Stationary flux vectors / 
in the network satisfy S/ — o and /^ > for each re- 
It is not hard 



action r, see e.g. pfl, E2l 23 



26 



to see that if all reactions are mono- molecular, then S is 
the incidence matrix of a directed graph; The stationary 
flux vectors span the cycle space of this graph. The close 
connection between the cycle space of a directed graph 
and its underlying undirected graph p7[ allows us to use 
the relevant cycles of the substrate graph E to describe 
the structure of the metabolic network in a way comple- 
mentary to that provided by MFA. 

For our analysis of metabolic graphs, we use the sub- 
strate graph of the Ecolil core metabolism, a set of 
chemical reactions representing the central routes of en- 
ergy metabolism and small-molecule building block syn- 
thesis. Similar to ||l^, we omit the following substrates 
from the graph: CO2, NH3, SO4, AMP,ADP, and ATP, 
their deoxy-derivatives, both the oxidized and reduced 
form of thioredoxine, organic phosphate and pyrophos- 
phate. The resulting graph has n = 272 vertices and 
|-E| = 652 edges. It is analyzed below. 

Table | shows that the three random models AB, SWl, 
and ER agree at least qualitatively with each other. The 
AB random graphs exhibit a much broader distribution 
of cycle sizes (not shown) than the ER and SWl mod- 
els. As a consequence, the average cycle numbers for 
ER and SWl have statistical uncertainty of about 2%, 
while the uncertainty of the AB values is 5 to 10 times 
higher. Note that ER and SWl have a similar number 
of relevant cycles, but the cycles are slightly longer in 




way from a biochemical chart, but hnks serval pathways 
together. 



FIG. 5: The subgraph of Ecoli spanned by the relevant 
cycles of length 9. Two of these long cycles are high- 
lighted. The edges shown in bold are part of each of the 
16 relevant 9-cycles. 



SWl. Two features distinguish the metabolic network 
Ecolil from all random networks: (1) The number A of 
triangles is almost 10 times larger than expected. This 
can be explained at least in part as a consequence of 
the substrate graph representation: multi-molecular re- 
actions translate to cliques and hence a large number of 
triangles. The ratio 282/379 « 0.744 indicates that in 
fact almost all triangles are contained in 4-cliques, since 
in each 4-clique we have three triangles that belong to a 
particular MCB, while the fourth face of the tetrahedron 
is their 0-sum [^. (2) There is a much smaller number 
of relevant pentagons and hexagons, which results in an 
overall somewhat reduced number of relevant cycles: 723 
compared to about 1060 (AB), 904 (ER), and 805 (SWl). 

Strictly speaking, we do not know the biological signif- 
icance of this relative paucity of longer cycles. However, 
we would like to venture a speculation. Organisms are 
constantly exposed to environmental fluctuations requir- 
ing transitions in metabolic states. That is, a metabolic 
network needs to produce different outputs depending 
on the environment. Environments may vary rapidly, re- 
quiring rapid transition between metabolic states. Pos- 
sibly, networks with long cycles have longer transition 
times, because environmental perturbations may lead to 
prolonged oscillations in such networks. The dynamical 
system representation of metabolic networks required to 
test this idea rigorously lies beyond the scope of this ar- 
ticle. 

The longest relevant cycles in a metabolic networks are 
of particular interest since they reflect parts of the net- 
work that cannot easily be replaced by alternative routes. 
In Fig. Islwe show the largest such cycle in Ecolil. We 
emphasize that the cycles in our analysis represent routes 
for transmission of perturbations, but not necessarily of 
mass, as it is commonly considered in MFA. This is ap- 
parent from Fig.|5| , which does not correspond to a path- 
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